home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Aminet 2
/
Aminet AMIGA CDROM (1994)(Walnut Creek)[Feb 1994][W.O. 44790-1].iso
/
Aminet
/
util
/
gnu
/
textutils_1_3.LHA
/
textutils-1.3
/
cat
/
tr.1
< prev
next >
Wrap
Text File
|
1992-09-22
|
10KB
|
331 lines
TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
NAME
tr - translate or delete characters
SYNOPSIS
tr [-cst] [--complement] [--squeeze-repeats]
[--truncate-set1] string1 string2
tr {-s,--squeeze-repeats} [-c] [--complement] string1
tr {-d,--delete} [-c] string1
tr {-d,--delete} {-s,--squeeze-repeats} [-c] [--complement]
string1 string2
DESCRIPTION
This manual page documents the GNU version of tr. tr copies
the standard input to the standard output, performing one of
the following operations:
o+ translate, and optionally squeeze repeated characters
in the result
o+ squeeze repeated characters
o+ delete characters
o+ delete characters, then squeeze repeated characters
from the result.
The _s_t_r_i_n_g_1 and (if given) _s_t_r_i_n_g_2 arguments define ordered
sets of characters, referred to below as set1 and set2.
These sets are the characters of the input that tr operates
on. The --_c_o_m_p_l_e_m_e_n_t (-_c) option replaces set1 with its
complement (all of the characters that are not in set1).
SPECIFYING SETS OF CHARACTERS
The format of the _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 arguments resembles
the format of regular expressions; however, they are not
regular expressions, only lists of characters. Most charac-
ters simply represent themselves in these strings, but the
strings can contain the shorthands listed below, for con-
venience. Some of them can be used only in _s_t_r_i_n_g_1 or
_s_t_r_i_n_g_2, as noted below.
Backslash excapes. A backslash followed by a character not
listed below causes an error message.
\a Control-G.
\b Control-H.
\f Control-L.
\n Control-J.
\r Control-M.
\t Control-I.
Sun Release 4.1 Last change: 1
TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
\v Control-K.
\ooo The character with the value given by _o_o_o, which is 1
to 3 octal digits.
\\ A backslash.
Ranges. The notation `_m-_n' expands to all of the characters
from _m through _n, in ascending order. _m should collate
before _n; if it doesn't, an error results. As an example,
`0-9' is the same as `0123456789'. Ranges can optionally be
enclosed in square brackets, which has no effect but is sup-
ported for compatibility with historical System V versions
of tr.
Repeated characters. The notation `[_c*_n]' in _s_t_r_i_n_g_2
expands to _n copies of character _c. Thus, `[y*6]' is the
same as `yyyyyy'. The notation `[_c*]' in _s_t_r_i_n_g_2 expands to
as many copies of _c as are needed to make set2 as long as
set1. If _n begins with a 0, it is interpreted in octal,
otherwise in decimal.
Character classes. The notation `[:_c_l_a_s_s-_n_a_m_e:]' expands to
all of the characters in the (predefined) class named
_c_l_a_s_s-_n_a_m_e. The characters expand in no particular order,
except for the `upper' and `lower' classes, which expand in
ascending order. When the --_d_e_l_e_t_e (-_d) and
--_s_q_u_e_e_z_e-_r_e_p_e_a_t_s (-_s) options are both given, any character
class can be used in _s_t_r_i_n_g_2. Otherwise, only the character
classes `lower' and `upper' are accepted in _s_t_r_i_n_g_2, and
then only if the corresponding character class (`upper' and
`lower', respectively) is specified in the same relative
position in _s_t_r_i_n_g_1. Doing this specifies case conversion.
The class names are given below; an error results when an
invalid class name is given.
alnum
Letters and digits.
alpha
Letters.
blank
Horizontal whitespace.
cntrl
Control characters.
digit
Digits.
graph
Sun Release 4.1 Last change: 2
TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
Printable characters, not including space.
lower
Lowercase letters.
print
Printable characters, including space.
punct
Punctuation characters.
space
Horizontal or vertical whitespace.
upper
Uppercase letters.
xdigit
Hexadecimal digits.
Equivalence classes. The syntax `[=_c=]' expands to all of
the characters that are equivalent to _c, in no particular
order. Equivalence classes are a recent invention intended
to support non-English alphabets. But there seems to be no
standard way to define them or determine their contents.
Therefore, they are not fully implemented in GNU tr; each
character's equivalence class consists only of that charac-
ter, which makes this a useless construction currently.
TRANSLATING
tr performs translation when _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 are both
given and the --delete (-_d) option is not given. tr
translates each character of its input that is in set1 to
the corresponding character in set2. Characters not in set1
are passed through unchanged. When a character appears more
than once in set1 and the corresponding characters in set2
are not all the same, only the final one is used. For exam-
ple, these two commands are equivalent:
tr aaa xyz
tr a z
A common use of tr is to convert lowercase characters to
uppercase. This can be done in many ways. Here are three
of them:
tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
tr a-z A-Z
tr '[:lower:]' '[:upper:]'
When tr is performing translation, set1 and set2 should nor-
mally have the same length. If set1 is shorter than set2,
the extra characters at the end of set2 are ignored.
Sun Release 4.1 Last change: 3
TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
On the other hand, making set1 longer than set2 is not port-
able; POSIX.2 says that the result is undefined. In this
situation, the BSD tr pads set2 to the length of set1 by
repeating the last character of set2 as many times as neces-
sary. The System V tr truncates set1 to the length of set2.
By default, GNU tr handles this case like the BSD tr does.
When the --truncate-set1 (-_t) option is given, GNU tr han-
dles this case like the System V tr instead. This option is
ignored for operations other than translation.
Acting like the System V tr in this case breaks the rela-
tively common BSD idiom:
tr -cs A-Za-z0-9 '\012'
because it converts only zero bytes (the first element in
the complement of set1), rather than all non-alphanumerics,
to newlines.
SQUEEZING REPEATS AND DELETING
When given just the --delete (-_d) option, tr removes any
input characters that are in set1.
When given just the --squeeze-repeats (-_s) option, tr
replaces each input sequence of a repeated character that is
in set1 with a single occurrence of that character.
When given both the --delete and the --squeeze-repeats
options, tr first performs any deletions using set1, then
squeezes repeats from any remaining characters using set2.
The --squeeze-repeats option may also be used when translat-
ing, in which case tr first peforms translation, then
squeezes repeats from any remaining characters using set2.
Here are some examples to illustrate various combinations of
options:
Remove all zero bytes:
tr -d '\000'
Put all words on lines by themselves. This converts all
non-alphanumeric characters to newlines, then squeezes each
string of repeated newlines into a single newline:
tr -cs '[a-zA-Z0-9]' '[\n*]'
Convert each sequence of repeated newlines to a single new-
line:
tr -s '\n'
WARNING MESSAGES
Setting the environment variable POSIXLY_CORRECT turns off
several warning and error messages, for strict compliance
Sun Release 4.1 Last change: 4
TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
with POSIX.2. The messages normally occur in the following
circumstances:
1. When the --_d_e_l_e_t_e option is given but --_s_q_u_e_e_z_e-_r_e_p_e_a_t_s
is not, and _s_t_r_i_n_g_2 is given, GNU tr by default prints a
usage message and exits, because _s_t_r_i_n_g_2 would not be used.
The POSIX specification says that _s_t_r_i_n_g_2 must be ignored in
this case. Silently ignoring arguments is a bad idea.
2. When an ambiguous octal escape is given. For example,
\400 is actually \40 followed by the digit 0, because the
value 400 octal does not fit into a single byte.
Note that GNU tr does not provide complete BSD or System V
compatibility. For example, there is no option to disable
interpretation of the POSIX constructs [:alpha:], [=c=], and
[c*10]. Also, GNU tr does not delete zero bytes automati-
cally, unlike traditional UNIX versions, which provide no
way to preserve zero bytes.
The long-named options can be introduced with `+' as well as
`--', for compatibility with previous releases. Eventually
support for `+' will be removed, because it is incompatible
with the POSIX.2 standard.
Sun Release 4.1 Last change: 5